COMPUTER  AIDING  OF  MAINTAINABILITY  DESIGN: 


A  FEASIBILITY  STUDY 


Douglas  M.  Towne 
Mark  C.  Johnson 


November  1 984 
Technical  Report  No.  104 

BEHAVIORAL  TECHNOLOGY  LABORATORIES 
Department  of  Psychology 
University  of  Southern  California 


Sponsored  by 

The  Engineering  Psychology  Group 
Office  of  Naval  Research 

Under  Contract  No.  N00014-80-C-0493 


Approved  for  public  release:  distribution  unlimited 

Reproduction  in  whole  or  in  part  is  permitted 
for  any  purpose  of  the  United  States  Government. 


COMPUTER  AIDING  OF  MAINTAINABILITY  DESIGN: 
A  FEASIBILITY  STUDY 


Douglas  M.  Towne 
Mark  C.  Johnson 


November  1984 
Technical  Report  No.  104 

BEHAVIORAL  TECHNOLOGY  LABORATORIES 
Department  of  Psychology 
University  of  Southern  California 


Sponsored  by 

The  Engineering  Psychology  Group 
Office  of  Naval  Research 

Under  Contract  No.  N00014-80-C-0493 
ONR  NR503-003 


Approved  for  public  release:  distribution  unlimited. 

Reproduction  in  whole  or  in  part  is  permitted 
for  any  purpose  of  the  United  States  Government. 


SECURITY  CLASSIFICATION  of  This  PAGE  flWian  Data  Entered) 


REPORT  DOCUMENTATION  PAGE 

READ  INSTRUCTIONS 

BEFORE  COMPLETING  FORM 

t.  REPORT  NUMBER 

U .  |  j  rr-  r^rrcr.mmBmm 

Technical  Report  No.  104  rr  t 

4.  TITLE  (w\d  Subtitle) 

5.  TYPE  OP  REPORT  A  PERtOO  COVEREO 

Computer  Aiding  of  Maintainability  Design 

Interim  (9-83  to  11-84) 

S.  PERFORMING  ORG.  REPORT  NUMBER 

Technical  Report  No.  104 

7.  authorc*; 

Douglas  M.  Towne,  and  Mark  C.  Johnson 

1.  CONTRACT  OR  GRANT  NUMBERfaJ 

N00014-80-C-0493 

University  of  Southern  tali  form  a 

Behavioral  Technology  Laboratories 

1845  S.  Elena  Ave.,  Redondo  Beach,  CA  90277 

to.  program  element.  PROJECT.  TASK 
AREA  *  WORK  UNIT  NUMBERS 

NR  503-003 

CONTROLLING  OFFICE  NAME  AND  ADDRESS 

Office  of  Naval  Research,  Engineering  Psychology  | 
800  North  Quincy  St.,  Arlington,  VA  22217 

12.  REPORT  OATE 

November  1984 

o.  NUMBER  OF  PAGES 

30  ±  v _ 

U  MONITORING  AGENCY  NAME  §  ADDRESS (It  dlttifiU  from  Controlling  Ottle •) 

IS.  SECURITY  CLASS,  (ot  Ihle  report) 

Unclassified 

ISA.  DECLASSIFICATION/  DOWNGRADING 
SCHEDULE 

1 1«.  distribution  statement  rof  R#poro  j 

Approved  for  public  release;  distribution  unlimited 


SOCuNtTT  CLASSIFICATION  or  THI»  MM  <Wkm  Dmm  !■«— < _ _ 

The  PROFILE  model  of  diagnosis  and  repair  performance  requires  data  con¬ 
cerning  the  possible  effects  of  failures  within  the  system  under  design. 

A  general-purpose  fault  simulation  system  will  be  developed  which  will 
generate  the  required  data  from  design  specifications  of  the  type  produced 
within  conventional  CAD  systems. 

With  the  completion  of  the  fault  simulation  capability,  the  PROFILE  model 
and  its  associated  maintainability  analysis  processes  can  be  employed  with¬ 
in  a  conventional  CAD  environment. 


VN  0102-  LF.  014-4401 


seen hitv  classification  of  this  fao«(»n«a  *■»•*•« 

-11- 


ABSTRACT 


Computer- Implemented  processes  have  been  developed  to  aid  a  designer  in 
determining  the  maintainability  consequences  of  design  decisions.  These 
processes  operate  upon  detailed  sequences  of  diagnosis  and  repair  actions 
generated  by  a  model  of  corrective  maintenance  performance,  PROFILE. 

The  design  aiding  processes  generate  summaries  of  maintenance  times, 
actions,  false  replacements,  and  other  related  maintenance  measures  to  aid 
in  the  discovery  of  maintainability  problems,  the  analysis  of  design 
options,  and  the  projection  of  expected  maintenance  workload. 

The  PROFILE  model  of  diagnosis  and  repair  performance  requires  data 
concerning  the  possible  effects  of  failures  within  the  system  under  design. 

A  general-purpose  fault  simulation  system  will  be  developed  which  will 
generate  the  required  data  from  design  specifications  of  the  type  produced 
within  conventional  CAD  systems. 

With  the  completion  of  the  fault  simulation  capability,  the  PROFILE 
model  and  its  associated  maintainability  analysis  processes  can  be  employed 
within  a  conventional  CAD  environment. 
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SECTION  I.  BACKGROUND 


During  the  years  1980  to  1983  this  organization  developed,  under  Office 
of  Naval  Research  sponsorship,  a  model  of  corrective  maintenance  performance 
which  generates  fault  isolation  and  repair  action  sequences  generally 
representative  of  those  performed  by  trained  technicians  (Towne,  Johnson, 

&  Corwin,  1983). 

The  model,  PROFILE,  operates  upon  sped''  ,ioiw  of  the  system  design 
to  generate  representative  sequences  of  mai,-..erance  actions  to  diagnose  and 
repair  each  of  a  sample  of  faults  in  a  system.  ^FILE  is  a  fully  generic 
model  of  expert  troubleshooting  behavior,  i.e.,  the  intelligence  to  select 
and  interpret  tests  is  defined  in  a  general  manner,  and  is  applied  to  any 
specific  representation  of  a  system.  The  specifications  define  the  internal 
architecture  of  the  system,  the  physical  structure  of  the  assembly,  and  the 
design  of  the  external  panels. 

Other  associated  routines  operate  upon  the  generated  action  sequences 
to  compute  the  manual  times  to  perform  each  maintenance  sequence.  From 
this  are  produced  distributions  of  repair  times  and  relevant  statistics  such 
as  Mean  Time  to  Repair  (MTTR)  and  maximum  repair  time. 

Pgy.elppBgnt  of  the  Model 

PROFILE  is  implemented  as  a  computer  program  consisting  of  three 
primary  operators:  1)  a  test  selector,  2)  a  test  performer,  and  3)  a  test 
interpreter.  These  three  program  modules  attempt  to  make  testing  decisions 
very  much  like  those  of  expert  maintenance  technicians. 

Given  a  sample  fault,  the  test  selector  in  PROFILE  first  determines  the 
most  effective  test  to  perform  to  determine  the  status  of  major  sub-systems. 
The  test  performer  simulates  the  performance  of  the  selected  test  by 
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obtaining,  from  a  data  base,  the  symptoms  which  the  simulated  fault  would 
produce  for  that  test.  Then  the  test  interpreter  draws  conclusions  about 
the  possible  significance  of  the  test  result,  in  light  of  any  previous 
results  obtained. 
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A  diagnosis  sequence  is  generated  for  a  sample  fault  by  applying  these 
three  routines  repeatedly  until  the  true  fault  is  identified  and  resolved. 

The  initial  model  of  expert  troubleshooting  behavior  attempted  to 
minimize  the  time  to  accomplish  corrective  maintenance,  without  regard  for 
the  spare  parts  consummed.  This  model  was  exhaustively  compared  to  detailed 
troubleshooting  sequences  of  expert  technicians,  and  was  found  to  call  for 
replacement  when  further  testing  would  be  more  economical. 

In  this  study,  forty-eight  Navy  electronics  instructors  attempted  to 
individually  find  and  repair  eight  faults  in  a  small  computer  system, 
including  keyboard,  disk  drives,  CRT,  and  printer.  To  achieve  high  control 
for  this  model-development  phase,  a  computer  was  used  to  administer  the 
problems.  The  participant  selected  tests  at  the  computer  keyboard,  and  then 
viewed  a  video  tape  segment  of  the  test  being  performed,  and  results  being 
obtained.  Replacements  were  similarly  requested  at  the  keyboard,  and 
presented  by  video  tape  segements. 

The  overly  narrow  objective  of  the  initial  model  caused  it  to  perform 
replacements  of  system  modules  when  real  technicians  would  ordinarily 
continue  testing.  After  lengthy  refinement  and  enhancement  of  the  model, 
the  present  PROFILE  model  emerged.  The  model's  replacement  decisions  are 
now  shaped  by  parameters  reflecting  costs  of  spare  parts,  spares 
availability,  and  urgency  of  the  repair  setting. 


Major  revisions  were  also  made  in  the  way  in  which  a  particular 
system's  fault  effects  were  represented.  Initially  the  domain  data  for  a 
system  reflected  the  particular  symptoms  produced  by  each  possible  fault. 


This  data  form  required  a  high  degree  of  analysis  by  a  human  expert ,  and 
necessitated  a  very  large  data  base  far  a  system  under  study.  Comparison  of 
the  PROFILE  performances  with  the  actual  troubleshooting  sequences  revealed 
that  the  human  experts  were  not  able  to  employ  the  full  power  of  this 
symptom  data  in  interpreting  symptom  information.  As  a  result,  the  actual 
diagnosis  sequences  were  considerably  longer  than  the  PROFILE  projections. 

A  number  of  alternative  representation  forms  were  then  submitted  to  the 
model,  to  determine  if  the  symptom  data  could  somehow  be  obscured  in  a 
natural  and  systemmatic  fashion,  and  more  realistic  diagnosis  sequences 
obtained.  A  form  was  finally  tested  which  yielded  extremely  realistic 
testing  sequences.  This  data  form  reflects  only  cause-effect  relationships 
such  as 

a  fault  in  X  MAY  affect  indicator  Y 
a  fault  in  X  WILL  affect  indicator  Y 
a  fault  in  X  CANNOT  affect  indicator  Y 

The  successful  use  of  this  simpler  fault  effect  data  also  allowed  the 
domain-specific  data  for  a  particular  system  to  be  more  compact  and  more 
easily  prepared.  In  fact,  as  is  discussed  later,  it  is  feasible  to  consider 
automated  techniques  for  the  generation  of  these  data  forms. 

When  changed  as  described  above,  the  PROFILE  model  produced  testing 
sequences  whose  times  corresponded  very  closely  with  the  means  of  the 
experimentally  observed  times,  for  each  problem.  Furthermore,  the  content 
of  the  generated  testing  sequences  corresponded  closely  with  that  of  the 
observed  sequences. 

As  a  validation,  a  second  study  was  performed  involving  a  different 
target  system  (an  infrared  transmit ter/receiver )  under  two  alternate 
designs.  In  this  study  the  technicians  performed  tests  on  the  transmitter/ 
receiver  until  the  fault  was  isolated  and  replaced.  High  correlations 
(r=0.89  for  one  design  and  r=0.77  for  the  second  design)  were  obtained 
between  the  means  of  the  observed  times  for  the  problems  and  the  times 
projected  by  the  model. 


Objectives  of  the  Feasibility  Study 


The  PROFILE  model  has  potential  for  addressing  two  general  needs: 

1 )  aiding  the  designer  at  the  stage  of  product  engineering  when  hardware 
packaging,  layout,  and  human  factors  decisions  are  made,  and  2)  projecting 
the  maintainability  workload  of  a  completed  design  proposal.  Of  these  two 
possible  applications,  the  former  is  considerably  more  challenging. 
Interacting  with  the  designer  in  productive  ways  requires  an  involvement  in 
the  design  process  itself,  whereas  after-the-fact  evaluation  of  a  system 
design  is  essentially  a  subset  of  the  larger  design  support  requirement. 

During  the  past  year  we  have  explored  the  feasibility  of  employing  the 
PROFILE  model  as  a  design  tool.  The  two  central  issues  considered  by  this 
study  have  been  1)  the  types  of  design  assistance  which  a  PROFILE-based 
technique  can  make  available  to  the  designer,  and  2)  ways  in  which  the 
required  design  specifications  can  most  easily  be  acquired. 

Design  Assistance.  Section  II  will  present  the  facilities  which  have 
been  developed  to  assist  the  designer  in  identifying  and  rectifying 
maintainability  shortcomings  in  an  emerging  design.  Operating  upon  the 
maintenance  action  projections  of  PROFILE,  these  functions  offer  the 
following: 

*  distributions  of  corrective  maintenance  times 

*  an  analysis  of  the  utilities  of  the  maintenance-support  features 

in  the  design  for  accomplishing  fault  diagnosis;  these  include  such 
design  features  as  front  panel  indicators,  internal  test  points, 
and  automated  test  features  such  as  BIT  and  ATE. 

*  an  analysis  of  false  replacements 

*  a  summary  of  the  types  and  frequencies  of  maintenance  actions  required 
to  resolve  the  sample  of  faults,  and  the  proportion  of  time  required 
to  perform  each  type. 


-4- 


Facilitating  Preparation  of  Design  Data.  Section  III  will  describe  the 
design  of  a  simulation  program  which  was  formulated  to  effect  substantial 
reductions  in  the  skill  and  effort  required  to  apply  PROFILE  to  a  design 
under  development.  The  program  has  been  designed  to  accept  data  of  the  type 
generally  available  from  electronic  CAD  systems,  paving  the  way  for  ultimate 
development  of  an  integrated,  computer-based  system  which  offers 
maintainability  design  aiding  within  a  conventional  CAD-based  design  process. 

Long-Term  Objectives 

Figure  1  illustrates  the  components  of  a  complete  system  for 
computer-aided  design  for  maintainability,  which  we  term  CAD-M,  and  the  role 
of  PROFILE  in  that  total  system.  The  heart  of  CAD-M  lies  in  Block  C, 
which  contains  PROFILE  and  the  cognitive  time  model,  and  in  block  B,  which 
contains  the  program  for  computing  the  time  to  accomplish  a  maintenance 
operation. 

Block  A  contains  the  simulator  designed  during  this  study  which 
accepts  high-level  inputs  describing  the  functional  architecture  of  the 
design  and  produces  the  fault-effects  data  shown  in  block  B. 

The  routines  which  seek  and  display  evidence  of  design  weaknesses  are 
shown  in  blocks  D  and  E. 

Also  shown  in  these  two  blocks  are  1)  the  true  optimum  fault  diagnosis 
program  (Towne,  Johnson,  &  Corwin,  1982)  which  was  developed  using  a  dynamic 
programming  formulation  (in  Block  D),  and  2)  a  routine  which  compares 
optimal  maintenance  performance  to  that  projected  by  PROFILE  (in  Block  E). 

If  these  are  included  in  the  total  CAD-M  system,  the  designer  can  be 
advised  of  the  improvements  which  can  result  from  aiding  the  maintainor's 
performance  by  providing  online  decision  support.  This  decision  support 
could  be  provided  by  a  subset  of  the  CAD-M  software,  specifically  those 
functions  involved  in  producing  the  optimum  troubleshooting  sequences. 


for  Maintainability  (CAD-M) 


Current  Status 


The  functions  of  CAD-M  shown  in  blocks  B  through  E  are  now  complete 
except  for  the  following: 

*  the  Internal  Complexity  Evaluation  Program  (in  block  B) 

*  the  Cognitive  Time  Generator  (in  block  C) 

*  the  Decision  Aiding  analysis  (in  block  E) 

The  general  design  of  a  simulator  to  accept  either  CAD  outputs  or 
high-level  design  specifications  from  the  designer  is  completed,  and  is 
described  in  Section  III.  This  program  will  be  implemented  in  the  next 
year,  along  with  the  remaining  input  entry  routines  shown  in  block  A  of 
Figure  1 . 


SECTION  II.  INTELLIGENT  AIDING  OF  DESIGN  FOR  MAINTAINABILITY 


A  computer-based  maintainability  design  aid  may  ultimately  operate  in 
two  different  ways  to  support  the  consideration  of  maintainability  issues 
during  design.  In  the  first  mode,  designers  would  apply  the  technique 
during  the  design  cycle  to  analyze  the  maintainability  implications  of  their 
decisions  and  approach.  In  the  second  possible  mode  of  application,  the 
technique  might  be  applied  over  a  longer  term,  to  a  range  of  design 
applications,  in  order  to  derive  more  general  design  principles  which  could 
guide  designers  in  future  efforts. 

This  section  will  deal  primarily  with  the  former  application,  but  will 
conclude  with  a  brief  description  of  the  types  of  general  design 
relationships  which  might  be  derived  from  application  in  a  research  mode. 

On-Line  Aiding  of  Maintainability  Design  Decisions 

A  wide  range  of  maintainability  and  human  factors  questions  may  arise 
concerning  the  attractiveness  of  alternatives  during  the  design  of  a  complex 
system.  These  questions  might  be  classified  into  the  following  general 
categories: 

S.tAt.Mg : 

How  maintainable  would  the  system  be  under  the  current  design? 

What  maintenance  actions  would  be  involved  in  maintaining  this  system? 

What  consumption  of  spare  parts  is  expected? 

Change  Evaluations 

How  would  the  maintainability  of  the  current  design  be  affected  by 

particular  changes  under  consideration? 
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Can  any  of  the  maintenance  features  in  the  current  system  design  be 
eliminated  without  impairing  the  maintainability  of  the  system? 


Critical  Problem  Identification: 

Are  there  serious  maintainability  problems  in  the  current  design? 

What  are  they?  How  serious  are  they? 

The  following  will  describe  data  summaries  produced  by  CAD-M  to  assist 
the  designer  in  seeking  answers  to  these  four  types  of  questions.  The 
summaries  are  produced  by  programs  which  operate  upon  PROFILE-generated 
action  sequences,  for  the  sample  of  faults  analyzed.  To  be  meaningful,  this 
sample  must  be  constructed  in  a  way  which  reflects  the  estimated  failure 
probabilities  of  the  system  elements. 

Status  Summaries.  The  maintainability  status  of  a  current  design  is 
conveyed  to  the  designer  with  three  summaries: 

a.  a  distribution  of  maintenance  times  (diagnosis  plus  repair), 
along  with  Mean  Time  to  Repair  and  standard  deviation,  as  shown 
in  Figure  2.  Currently,  a  single  time  distribution  is  produced, 
for  the  entire  system.  In  the  future,  when  systems  are  defined 
hierarchically,  as  described  in  section  III,  the  time  distributions 
and  statistics  will  be  obtainable  for  each  unit  in  a  system  or 
sub-system.  This  will  allow  comparison,  for  example,  of  repair 
times  for  one  circuit  board  to  those  of  another  board,  or  repairs 
of  one  module  to  another. 

b.  a  summary  of  maintenance  actions  performed  to  resolve  the  faults 
analyzed  by  PROFILE,  along  with  the  time  devoted  to  each  action. 

An  example  of  this  work  oontent  summary  is  shown  in  Figure  3* 
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c.  an  analysis  of  replacements  projected  for  the  sample  of  faults 
analyzed,  as  shown  In  Figure  4.  As  opposed  to  a  replacement 
projection  based  entirely  upon  reliability  estimates,  this 
summary  also  reflects  the  extent  to  which  the  system  design 
promotes  the  incorrect,  but  not  necessarily  irrational,  replacement 
of  parts  (as,  for  example,  when  a  relatively  inexpensive  unit  is 
provisionally  replaced  in  preference  to  lengthy  continued  testing). 
Since  the  fault  sample  is  based  upon  reliability  data,  the  total 
replacement  frequencies  reflect  both  true  failure  likelihood  and 
aspects  of  the  design  which  promote  false  replacements. 


n:  194 


MTTR:  31.33 


Std.  dev.:  13.03 


IHHIIII  III!  Ill* 


repair  time 

(diagnosis  plus  repair  time,  Min.;  1  fault  per  •) 


Figure  2.  Example  Repair  Time  Distribution 
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Change  Evaluation.  By  evaluating  the  maintainability  status  before  and 
after  a  contemplated  change  is  specified  to  CAD-M,  a  designer  can  determine 
the  projected  impact  of  a  wide  range  of  design  modifications.  In  this  way 
the  designer  can  explore  the  impact  of  such  design  decisions  as 
modifications  to  the  front  panel,  changes  to  the  BIT  or  ATE  systems, 
provision  of  test  points,  packaging  of  boards  and  modules,  or  selection  of 
fasteners  and  means  of  accessing  internal  parts. 


To  measure  the  effect  of  a  contemplated  design  change,  a  user  would  do 
the  following: 

1 .  execute  CAD-M  on  the  current  design  to  obtain  a  measure  of 
its  maintainability  status  before  the  contemplated  changes. 

2.  create  a  copy  of  the  current  design  specifications  and 
modify  the  copy  to  reflect  the  contemplated  changes. 

3.  execute  CAD-M  on  the  modified  design,  and  evaluate  the 
differences. 

4.  if  the  designer  chooses  to  implement  the  changes,  the  modified 
specifications  become  the  current  design;  otherwise,  the 
modified  specifications  are  discarded. 


Simplification  Analysis.  This  category  of  maintainability  analysis  is 
concerned  with  identifying  hardware  included  in  a  system  design,  strictly 
for  maintainability  purposes,  which  contributes  very  little  to  the 
serviceability.  An  indicator  or  test  jack  might  turn  out  to  be  of  no 
utility  to  the  maintainer,  or  possibly  some  features  of  a  built-in-test 
system  might  be  unnecessary.  Items  found  to  be  unused  by  CAD-M  might 
be  retained  in  a  design  for  fulfilling  other  purposes;  this  analysis 
establishes  a  list  of  those  elements  which  should  be  considered  for 
elimination. 


Unnecessary  maintenance  hardware  is  distinguished  by  a  zero  frequency 
of  use  in  the  CAD-M  Test  Usage  Summary,  Figure  5.  In  this  example,  all 
front  panel  indicators  were  used,  but  a  number  of  test  points  were  not. 
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Figure  5.  Test  Usage  Summary. 

A  second  type  of  analysis,  shown  in  Figure  6,  displays  the  relative 
power  of  the  fault  diagnosis  features  of  a  design  for  identifying  faults. 

In  this  analysis  U-REDCT  is  the  total  uncertainty  reduction  contributed  by 
each  test,  over  the  sample  of  faults  analyzed.  This  is  a  measure  of  the 
extent  to  which  the  test  aided  in  identifying  the  faults  in  the  sample.  The 
U/TIME  column  displays  the  fault  isolation  power  divided  by  the  time 
required  to  perform  the  test. 
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Figure  6.  Test  Power  Analysis 


Critical  Problem  Identification.  Critical  maintainability  problems 
would  be  evidenced  by  excessive  repair  times  or  excessive  false 
replacements.  The  determination  of  just  what  repair  time  or  false 
replacement  rate  is  excessive  is  a  subjective  one,  which  the  designer  or 
logistics  specialist  must  make.  The  identification  of  faults  which  are 
unusually  difficult  to  resolve  would  begin  by  examining  the  maintenance  time 
listing  shown  in  Figure  7.  Here  the  designer  sees  the  total  diagnosis  and 
repair  time  projected  for  each  fault  in  the  sample.  If  some  shared 
characteristics  were  noticed  about  many  of  the  faults  found  to  be  difficult 
to  resolve,  the  designer  might  request  and  examine  detailed  problem 
summaries,  which  provide  the  step-by-step  sequence  of  projected  testing 
actions  for  those  faults. 
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Figure  7.  Maintenance  Time  Listing 

As  shown  in  Figure  8,  the  detailed  diagnosis  and  repair  sequence  lists 
the  testing  sequence  projected  for  the  fault  with  the  times  to  perform  the 
associated  maintenance  actions.  From  this,  the  designer  may  determine 
whether  the  repair  time  resulted  from  a  difficulty  in  identifying  the  fault 
or  from  a  difficulty  in  effecting  the  repair  or  adjustment,  or  both.  In 
some  cases,  the  analysis  may  show  that  a  group  of  excess  repair  times  is  a 
result  of  mis-diagnosis  which  might  be  rectified  by  providing  additional 
test  points  or  displays. 
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Figure  8.  Detailed  Diagnosis  and  Repair  Sequence 


The  CAD-M  technique  has  potential  for  exploring  general  principles  of 
design  for  maintainability.  Such  principles  could  emerge  as  a  result  of 
long-term  application  to  a  range  of  design  projects,  or  they  may  be  the 
result  of  studies  in  which  design  variables  are  systematically  manipulated. 
This  section  will  briefly  discuss  some  of  the  types  of  questions  which  may 
be  addressed  in  this  manner. 

Test  Power.  A  general  question  concerning  the  design  of  tests  for 
fault  diagnosis  concerns  the  advisability  of  providing  many  relatively  weak, 
but  easily  interpreted,  tests  versus  fewer,  more  powerful,  tests.  There  may 
be  some  range  of  test  power  which  allows  easy  interpretation  of  symptoms, 
but  which  avoids  excessive  testing  steps.  A  related  question  concerns  the 

provision  of  test  points  versus  front-panel  indicators.  Insights  into  the 

% 

relative  benefits  of  front  panel  indicators  would  be  useful  in  determining 
when  their  added  cost  is  warranted. 

Level  of  Built-in  Test.  Experimentation  with  CAD-M  may  shed  light  on 
questions  concerning  the  level  of  fault  isolation  which  is  most  appropriate 
to  address  with  BIT,  as  opposed  to  manual  troubleshooting  procedures.  While 
generalities  may  be  difficult  to  realize  in  this  area,  designers  may  obtain 
useful  information  regarding  the  times  required  in  manual  troubleshooting 
for  various  phases  of  diagnosis.  Such  data  could  be  useful  in  determining 
the  proper  extent  of  a  BIT  capability. 


Accessibility  and  Modularity.  Designers  often  have  considerable 
options  concerning  the  packaging  of  hardware  and  the  means  by  which 
sub-units  are  accessed.  Typically,  the  designer  can  estimate  the 
approximate  cost  difference  among  such  alternatives,  but  has  very  little 
data  on  the  maintainability  consequences.  For  example,  what  is  the  payoff 
in  mean  repair  time  for  each  minute  reduction  in  gaining  access  to  internal 
test  points?  Or,  how  does  Mean  Time  to  Repair  vary  as  the  component  count 
on  circuit  boards  varies? 


Other  higher-level  generalities  may  emerge  which  have  implications  in 
other  aspects  of  equipment  availability.  One  productive  line  of  inquiry 
would  be  to  investigate  the  sensitivity  of  repair  times  to  the  efficiency  of 
the  diagnostic  strategy,  and  to  the  correctness  of  the  symptom 
interpretations.  Our  tentative  finding,  based  upon  just  three  applications 
of  CAD-M,  is  that  repair  times  are  not  highly  sensitive  to  efficiency,  but 
are  highly  affected  by  symptom  interpretation  accuracy.  If  this  tentative 
finding  holds  up  to  thorough  experimentation,  it  would  have  implications  for 
both  designers  and  trainers. 

A  second  attitude  which  we  are  coming  to  embrace  as  a  result  of 
applying  CAD-M  is  that  the  design  of  equipment  may  be  responsible  for  many 
more  false  replacements  than  is  currently  recognized.  This  suspicion  is  a 
result  of  observing  a  substantial  false  replacement  rate  when  CAD-M  applies 
an  entirely  rational  diagnostic  strategy  to  some  designs.  The  general 
opinion  in  the  maintenance  world  seems  to  be  that  false  replacements  are 
almost  entirely  the  result  of  poor  technician  ability  or  training. 
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SECTION  III.  TECHNIQUES  FOR  SPECIFYING  SYSTEM  DESIGNS 


The  input  data  required  to  execute  the  PROFILE  model  constitute  a 
well-defined  specification  of  the  information  which  must  be  supplied  to 
support  analysis  of  maintainability.  In  the  experimental  applications  to 
date,  the  required  alphanumeric  data  have  been  prepared  to  describe  a 
particular  system  design,  and  have  been  entered  via  keyboard  in  the  form 
shown  in  Appendix  A. 

The  two  major  portions  of  the  current  specification  format  are:  1)  the 
fault-effects  array,  which  relates  possible  failures  to  their  symptoms,  and 
2)  the  listing  of  symptoms  for  the  specific  faults  comprising  the  sample  to 
be  analyzed. 

Limitations  of  Manual  Techniques 

Unfortunately,  considerable  expertise  and  effort  awe  required  to 
formulate  and  enter  these  two  data  sets.  Both  can  become  voluminous  and 
complex  for  a  large  system,  and  they  may  require  an  analysis  of  fault 
effects  more  extensive  than  that  required  to  accomplish  the  functional 
design  of  the  system.  This  could  present  a  serious  obstacle  to  effective 
application  of  CAD-M,  as  organizations  may  not  be  inclined  to  expend  the 
resources  required  to  meet  non-operational  objectives. 

Automating  the  Generation  of  Specification  Data 

A  central  consideration  of  this  study  has  been  the  feasibility  of 
generating  the  required  fault-effect  data  from  more  easily  produced  system 
descriptions.  Specifically,  we  have  explored  the  means  by  which  the  data 
might  be  produced  by  a  computer-based  simulator  operating  upon  graphic 
representations  of  the  system's  functional  structure  and  organization. 


This  section  will  describe  the  necessary  inputs  to  the  simulator 
and  briefly  outline  how  it  will  generate  the  necessary  data  for  PROFILE. 

The  primary  functions  of  the  program  are  to  select  and  simulate  faults 
in  the  representation  of  the  system  design.  In  this  way  it  discovers  and 
stores  the  effects  of  all  possible  'element  failures'  in  the  system,  i.e. , 
all  possible  failures  within  hardware  elements  which  cause  one  or  more  of 
their  outputs  to  be  abnormal.  This  class  of  failures  can  include  breaks  in 
signal  lines,  but  it  does  not  include  failures  which  alter  the  structure  of 
the  system,  i.  e. ,  two  or  more  signal  lines  becoming  incorrectly  connected 
(short-circuits).  Such  failure  effects  could  be  obtained  by  altering  the 
connectivity  data,  described  below,  to  reflect  the  altered  system  structure, 
but  this  would  require  involvement  by  the  user. 

The  simulator  will  initiate  the  analysis  of  a  selected  failure  by 
determining  how  the  failed  element  will  behave  in  the  selected  failure 
mode.  It  will  then  trace  the  effects  of  the  abnormal  outputs  throughout  the 
system.  The  tracing  of  effects  involves  recognizing  the  connectivity  of 
system  elements,  to  determine  the  path  of  effects,  and  it  involves 
simulation  of  the  other  system  elements,  to  compute  how  they  will  react  to 
abnormal  inputs.  Finally,  the  simulator  will  determine  what  symptoms  will 
appear  to  the  maintainer  under  various  testing  conditions.  From  this,  it 
will  construct  the  required  fault-effects  matrix  and  the  sample  of  specific 
faults,  in  the  form  shown  in  Appendix  A. 

Iks  Specification  Xesiinlaafi 

The  three  primary  elements  of  data  required  to  specify  the  functional 
organization  of  a  system  design  will  be  as  follows: 

1.  a  definition  of  each  'basic'  element  in  the  system. 

2.  data  describing  the  system  connectivity,  i.e.,  the  routing  of 
element  outputs  to  other  elements. 

3.  a  definition  of  the  functional  hierarchy  of  the  system. 


Other  data  representing  the  physical  construction  of  the  system  remain 


as  described  in  earlier  reports.  These  data  include  the  reliabilities  of 
the  basic  system  elements,  the  approximate  costs  of  the  replaceable  units, 
and  the  physical  structure  of  the  system. 

Basic  Element  Definitions.  A  basic  element  is,  by  definition,  a  level 
of  system  organization  which  is  not  further  defined  in  terms  of  a  more 
detailed  network  description.  Thus  basic  elements  compose  the  lowest  level 
of  system  description. 

The  definition  of  each  basic  element  will  Include  its  name,  names  of 
its  inputs  and  outputs,  and  a  rule  describing  its  possible  faulty  behaviors, 
as  described  below. 

The  user  will  decide  which  elements  in  a  system  shall  be  regarded  as 
•basic’.  These  will  be  elements  whose  behavior  is  relatively  simple  and 
whose  internal  structure  is  not  a  consideration  of  the  designer  (or  is  not 
yet  a  consideration  of  the  designer).  This  freedom  to  establish  the  basic 
building  blocks  of  a  system  design  at  any  level  can  be  exploited  to  reduce 
the  quantity  of  detail  supplied,  thereby  facilitating  analysis  of  designs 
long  before  the  details  have  been  worked  out. 

Basic  elements  might  be  individual  components,  or  possibly  standard 
circuits  or  subassemblies  which  are  employed  without  modification.  For 
example,  a  complete  power  supply  might  be  regarded  as  a  basic  element  if  its 
behavior  is  relatively  simple  (such  as  any  failure  causes  an  abnormal 
output),  and  the  designer  is  not  concerned  with  its  Internal  makeup. 

Generally,  a  complicated  element  with  many  outputs  and  failure  modes 
would  be  described  as  a  network  of  basic  elements  or  other  networks,  rather 
than  as  a  basic  element.  In  this  way,  very  complex  systems,  and  resulting 
complex  behavior,  can  be  represented  via  a  network  of  simpler  elements. 
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The  complexity  of  an  element's  behavior  is  reflected  by  the  type  of 
rule  required  to  specify  its  possible  modes  of  failure.  Two  standard  rules 
of  failure  behavior  will  be  built  in,  and  can  be  selected  by  the  user  to 
describe  any  basic  element.  The  first  rule  states  that  any  failure  of  the 
element  causes  all  of  its  outputs  to  be  abnormal.  The  second  built-in  rule 
states  that  each  one  of  the  outputs  can  be  abnormal,  with  an  equal 
probability. 

For  analyses  of  preliminary  designs,  the  CAD-M  user  may  select  either 
of  these  rules  to  apply  to  all  basic  elements,  thereby  avoiding  this  aspect 
of  the  specification  altogether.  Alternatively,  either  of  the  basic  rules 
may  be  selected  for  each  element.  In  the  most  complex  case,  when  maximum 
accuracy  is  desired,  the  CAD-M  user  may  define  a  unique  rule  for  any 
element,  which  states  just  what  combination  of  outputs  can  be  abnormal,  and 
the  approximate  probability  of  those  combinations. 

System  Connectivity.  The  inputs  and  outputs  defined  for  each  basic 
element  provide  the  connectivity  information  required  to  trace  failures  to 
their  effects.  These  data  reflect  what  inputs  enter  the  system  from  the 
outside  world,  how  these  inputs  pass  through  the  system,  and  what  outputs 
are  measurable  at  test  points  or  front  panel  indicators. 

Functional  Hierarchy.  The  functional  hierarchy  of  a  system  specifies 
how  basic  elements  are  combined  to  form  higher  level  functional  units, 
how  these  are  combined  at  higher  levels,  and  so  on.  Ultimately,  the 
total  system  may  be  represented  as  a  configuration  of  a  relatively  small 
number  of  lower-level  networks. 

The  role  of  the  functional  hierarchy  is  to  partially  compartmentalize 
information  for  PROFILE  so  that,  at  any  stage  in  its  fault  diagnosis,  it  1) 
restricts  its  search  for  faults  to  the  current  element  under  consideration, 
and  2)  it  encounters  incomplete  information  about  the  behavior  of  an  element 
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which  either  must  be  resolved  by  exploring  the  sub-structure  of  the  element 
or  must  be  endured  by  limiting  the  power  of  the  conclusions  drawn  from 
test  results. 

When  a  system  is  specified  as  a  hierarchical  structure  of  basic 
elements,  very  complex  system  behavior  can  be  discovered  by  the  simulator, 
as  a  result  of  analyzing  the  propagation  of  fault  effects  through  the 
functional  units.  This  fault  analysis  may  well  be  a  product  of  value  in  its 
own  right,  as  well  as  providing  the  necessary  ingredients  to  PROFILE. 

As  a  simple  example  of  the  inferencing  of  fault  effects,  Figure  9  shows 
a  portion  of  a  two-level  hierarchy;  the  top-level  system  is  labeled  A,  and 
one  of  its  sub-elements,  A.B,  is  shown  in  further  detail.  Assuming  that 
all  sub-elements  of  A.B  are  basic  elements,  and  that  they  follow  the 
simplest  failure  mode  rule  (any  failure  produces  all  abnormal  outputs),  the 
following  inferences  may  be  made  about  the  effects  of  two  particular 
failures: 

Failure  in  A.B. A:  The  abnormal  signals  in  A.B  will  be  9»  10,  11,  12,  and  4. 

The  abnormal  signals  in  A  will  be  4,  5,  6,  7,  and  8 
(signal  4  in  A  is ’identical  to  signal  4  in  A.B). 

Failure  in  A. A:  The  abnormal  signals  in  A  will  be  3,  5»  6,  4,  7,  and  8. 

The  abnormal  signals  in  A.B  will  be  11  and  4. 

This  type  of  inferencing  is  the  type  which  existing  artificial 
intelligence  systems,  such  as  PROLOG  (Clocksin  &  Mellish,  1981),  can  do. 

Our  experimental  applications  of  PROLOG  have  led  us  to  conclude,  however, 
that  CAD-M  requires  a  simulator  developed  specifically  to  analyze 
hierarchical  structures  such  as  that  shown  in  Figure  9.  The  two  primary 
advantages  of  developing  such  a  capability  will  be  much  faster  execution 
speed  and  a  great  reduction  in  the  quantity  of  data  required  to  represent  a 
system.  Both  of  these  advantages  will  result  from  building  processes  into 
the  simulator  which  would  otherwise  be  represented  as  data  to  a  highly 
general-purpose  system  such  as  PROLOG. 


One  example  of  the  type  of  mechanism  which  will  be  Included  in  the 
simulator  concerns  multiple  faults,  or  more  specifically  cascading  faults 
(the  causation  of  faults  by  other  faults,  as  opposed  to  randomly  occurring 
multiple  faults).  A  simple  input  item  could  specify  that  a  failure  of 
some  type  in  one  element  could  cause  a  failure  in  another  element.  The 
simulator  will  process  these  simple  entries,  and  will  generate  fault 
effect  data  which  recognizes  the  probabilities  of  the  cascading  failure 
event.  While  the  same  operation  could  be  generated  in  PROLOG,  the 
data  would  have  to  supply  PROLOG  with  all  the  mechanisms  by  which  it 
generates  the  dependent  failures  and  their  effects. 


CAD  systems  for  electronics  design  vary  greatly,  but  in  general  they 
supply  information  concerning  1)  the  appearance  of  a  physical  system,  as  a 
collection  of  lines  or  more  complex  graphical  entities,  2)  the  electrical 
connectivity  of  points  in  circuits,  in  the  form  of  wiring  lists,  and  3)  some 
information  concerning  the  sub-structures  making  up  the  system. 

Sophisticated  electronic  CAD  systems  also  have  the  capabilities  to 
model  the  operation  of  low-level  components,  allowing  a  functional  analysis 
of  the  operation  of  circuits  and  collections  of  circuits.  Unfortunately, 
the  simulation  accomplished  for  design  purposes  differs  in  several  important 
respects  from  the  kind  required  to  support  the  PROFILE  model.  Electronic  CAD 
systems  require  data  about  components  which  is  far  more  detailed  than  that 
required  to  support  fault  effect  simulation.  And,  the  specification  of  the 
system  must  be  complete,  at  the  very  lowest  levels,  before  modeling  of 
circuit  behavior  can  be  initiated.  As  a  result  CAD  is  typically  employed 
for  the  design  of  individual  low-level  circuits,  rather  than  for  simulating 
the  high-level  behavior  of  the  complete  system. 


Furthermore,  the  types  of  results  produced  by  electronic  CAD  systems 
are  quite  different  than  those  required  by  PROFILE.  The  CAD  results  provide 
detailed  timing  diagrams,  voltage  levels,  and  other  electronic 
characteristics,  rather  than  the  symptom  information  available  at  indicators 
and  test  points. 

A  final  limitation  of  electronic  CAD  systems,  for  generating  fault 
effect  data  for  PROFILE,  is  that  their  application  is  restricted  to 
electronic  systems.  A  more  generic  resource  would  be  preferred. 

It  is  for  these  reasons  that  a  general-purpose  fault-effects  simulator, 
as  outlined  above,  is  required  within  CAD-M. 

The  development  of  a  simulator  of  this  type  will  accomplish  two  major 
objectives:  1)  it  will  facilitate  the  linking  of  CAD-M  to  commercial 
electronic  CAD  systems  gaining  wide  use  in  industry,  and  2)  it  will  present 
a  non-CAD  user  with  a  workable  approach  with  which  to  supply  design 
specifications. 

Following  development  of  this  simulator,  linking  CAD-M  to  a  particular 
conventional  CAD  system  will  require  the  development  of  a  minimal, 
special-purpose  interface  between  the  CAD  system  and  CAD-M.  The  particular 
transformations  required  will  depend  upon  the  CAD  system  involved;  in  most 
cases  the  extent  of  transformation  is  expected  to  be  quite  small. 

Two  types  of  interface  are  possible,  1)  a  ’pipe*  through  which  are  sent 
the  data  required  by  CAD-M,  or  2)  an  online  'bus'  by  which  CAD-M  is  able  to 
receive  data  as  it  is  developed  on  the  CAD  system.  The  former  approach  may 
be  accomplished  without  requiring  access  to  the  inner  structure  of  the  CAD 
software;  the  latter  approach  would  require  involvement  by  the  CAD 
developer.  To  establish  clear  interfacing  specifications  we  will  prepare 
a  formal  definition  of  the  data  requirements  of  CAD-M,  along  the 
philosophy  of  the  Initial  Graphics  Exchange  Specification  (Smith,  Bradford, 

&  Wellington,  1983). 


SECTION  IV.  SUM4ARY  AND  CONCLUSIONS 


Application  of  CAD-M 

The  CAD-M  functions  described  in  Section  II  have  been  developed  to  promote 
the  discovery  of  maintainability  problems,  the  analysis  of  design  options,  and 
the  projection  of  expected  maintenance  workload.  No  assumptions  have  been  made 
concerning  exactly  who  in  the  design  team  might  employ  the  process,  when  CAD-M 
might  enter  the  design  phase,  or  exactly  how  it  would  be  applied.  The  intention 
has  been  to  develop  a  system  which  does  not  require  a  highly  structured 
application  procedure. 

A  crucial  underlying  criterion,  however,  was  that  CAD-M  address  design 
issues  which  are  largely  under  the  control  of  the  designer,  and  issues  which  are 
not  deeply  intertwined  with  achieving  the  intended  operational  requirements  of 
the  system  under  design. 

In  some  development  environments  CAD-M  might  appropriately  be  integrated 
closely  into  a  CAD  system,  providing  maintainability  analyses  to  the  designer  as 
the  specifications  are  altered  within  the  CAD  system.  In  other  settings,  the 
technique  might  be  applied  as  a  discrete  analysis  phase,  possibly  by  a  team 
concentrating  on  logistics  issues.  In  either  case,  an  essential  capability  of 
CAD-M  is  that  it  will  allow  the  analysis  of  preliminary  design  specifications 
when  details  are  not  yet  established,  and  gradual  refinement  of  maintainability 
projections  as  the  details  of  the  design  evolve. 

Future.  fleaearch 

The  simulator  described  in  Section  III  will  be  implemented  in  the  coming 
year.  Two  alternate  modes  of  data  entry  are  planned,  an  alphanumeric  mode  and  a 
graphical  mode.  The  alphanumeric  form  will  be  developed  first,  and  will  accept 
input  data  which  convey  the  functional  topology  of  the  system  design.  This  mode 
of  operation  is  important  as  it  represents  the  most  general  interface  between 


PROFILE  and  existing  CAD  systems,  i.e.,  the  outputs  of  existing  CAD  systems  are 
similar  to  the  alphanumeric  inputs  required  by  the  simulator. 

The  graphical  input  capability  will  be  developed  to  facilitate  use  of  CAD-M 
as  a  stand-alone,  computer-aided  system  for  maintainability  design.  The  graphic 
editing  features  will  be  restricted  to  those  required  to  specify  the  functional 
hierarchy,  as  described  in  Section  III. 

The  final  ingredient  of  CAD-M  to  be  developed  is  a  technique  for  projecting 
the  cognitive  time  component  of  diagnose  and  repair  operations.  Preliminary 
regression  studies  indicate  that  acceptably  accurate  cognitive  time  estimates 
can  be  made  using  the  manual  testing  projections  of  PROFILE  as  a  basis.  The  key 
factors  which  have  been  identified  as  significant  variables  are  (in  order  of 
decreasing  significance)  1)  the  manual  time  projected  by  PROFILE  to 
perform  the  fault- isolation  tests,  2)  the  number  of  replacements  made 
to  resolve  the  failure,  and  3)  the  number  of  unique  indicators,  including 
test  points,  examined  to  isolate  the  fault. 

The  precision  with  which  cognitive  time  is  predicted  may  be  improved  by 
adding  some  measure  of  system  complexity.  Previously,  the  data  available  to 
PROFILE  have  not  reflected  the  functional  complexity  of  the  system.  With  the 
implementation  of  the  hierarchical  representation  described  in  Section  III,  an 
opportunity  will  exist  to  examine  the  internal  complexity  of  the  system  design. 
Such  factors  as  linearity  of  system  structure,  multiplicity  of  failure  modes, 
and  predictability  of  fault  effects  may  play  an  important  role  in  projecting 
the  cognitive  workload  associated  with  fault  diagnosis.  All  of  these  will 
be  measurable  from  the  data  structures  to  be  employed  in  CAD-M. 
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APPENDIX  A 


FAULT  EFFECTS  DATA 
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